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Abstract: We propose a new construction for low-density source codes with mul- 
tiple parameters that can be tuned to optimize the performance of the code. In 
{T} [ addition, we introduce a set of analysis techniques for deriving upper bounds for the 

expected distortion of our construction, as well as more general low-density construc- 
tions. We show that (with an optimal encoding algorithm) our codes achieve the 
rate-distortion bound for a binary symmetric source and Hamming distortion. Our 
methods also provide rigorous upper bounds on the minimum distortion achievable 
by previously proposed low-density constructions. 
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(D . 1 Introduction 
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While low-density parity check (LDPC) codes can provably approach the channel 
coding capacity JUj for point-to-point transmission, currently there are relatively 
few theoretical results on low-density codes for lossy source coding, channel coding 
with encoder side information, and source coding with decoder side information. 
Note that the latter three scenarios all involve some aspect of quantization. Even 
though quantization and error correction are closely related, the standard LDPC 
^ ■ constructions used for channel coding generally fail jHJ- One viable option is trellis- 

based quantization (TCQ) [7 , which has been used both for lossy source coding, as 
well as for distributed source coding [131 El Hl|- However, saturating fundamental 
bounds with TCQ requires taking the constraint length to infinity [H], which incurs 
exponential complexity even for message-passing decoders/encoders. Consequently, 
it is of considerable interest to develop low-density constructions that are also capable 
of saturating the information-theoretic bounds. 

Previous work jS] has shown that low-density generator matrix (LDGM) codes, 
which are dual to LDPC codes, are provably optimal for binary erasure quantization 
(a special type of source coding). This motivates the use of LDGM codes and variants 
for more general compression problems. Indeed, recent work [31 El El has shown 
empirically that LDGM codes, in conjunction with variants of sum-product message- 
passing for encoding, can approach the rate-distortion bound for a binary symmetric 
source (BSS). In addition, non-rigorous replica or cavity method calculations [31 E] 
also suggest that the theoretical performance of LDGM codes is close to optimal. 
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This paper makes two primary contributions to this area. First, we propose a new 
low-density construction for lossy source coding with multiple parameters that can 
be tuned to optimize the performance of the code. Our construction includes as a 
special case the ordinary LDGM codes examined previously [3J |HJ HH U2| • Second, we 
develop methods useful for analyzing the expected distortion of our constructions as 
well as more general lossy source codes. Using these methods, we show that (with 
optimal quantization) our codes saturate the rate-distortion bound for a uniform 
binary source and Hamming distortion. As we will show in a longer version of this 
paper, our methods also lead to rigorous upper bounds on the distortion achievable 
by a standard LDGM construction. Thus, provided that a low complexity iterative 
encoding algorithm can be found, our results suggest that low density codes can 
provide significant improvements for a wide class of quantization problems. 

The remainder of this paper is organized as follows. After introducing some nota- 
tion, we describe our new low density generator matrix construction in Section Eland 
bound its performance in Section El In particular, we describe the tools required to 
analyze low density source codes through a series of lemmas, which we believe illus- 
trate the main insights of the paper. Finally, we close with some concluding remarks 
in Section |U and postpone all proofs to the appendix. 

Notation: Vectors/sequences are denoted in bold (e.g., s), random variables in sans 
serif font (e.g., s), and random vectors/sequences in bold sans serif (e.g., s). Similarly, 
matrixes are denoted using bold capital letters (e.g., G) and random matrixes with 
bold sans serif capitals (e.g., G). We use /(■; •), H(-), and D (-||-) to denote mutual 
information, entropy, and relative entropy (Kullback-Leibler distance), respectively. 
Finally, we use card {■} to denote the cardinality of a set, || • \ \ p to denote the p-norm 
of a vector, and (t) to denote the entropy of a Bernoulli^) random variable. 

2 The Compound Construction 

The construction considered in the paper is illustrated in Fig. ^ the top section 
consists of an LDGM code C t of rate R t = — with n source bits and m information 

bits, whereas the bottom section consists of an LDPC code of rate Rh = 1 with 

m bits constrained by k checks. The compound code formed by joining the top and 
bottom code can generate 2 Rbm = 2 m ~ k possible source reconstructions of length n, 
so that the overall code C has rate R = R t Rb- Note that a check-regular LDGM 
code corresponds to the special case of setting R b = 1. 

To quantize a length n binary source vector s using the compound construction, 
an encoder finds an assignment for the m bits in the middle layer that satisfy the 
constraints of the bottom LDPC code. Formally, we can denote the m-by-n generator 
matrix for the top LDGM code as G and the k-by-m parity check matrix for the 
bottom LDPC code as H. Then q is a codeword of the overall code if q = w G 
and Hw' = for some assignment of the middle layer, which we denote as w. 
Thus, an optimal encoder for s would find the codeword minimizing the Hamming 



distance, d^(wG,s), such that Hw' = 0. Since the vector w has length m, storing 
or transmitting w directly would achieve only compression rate R t . Instead, we can 
use the fact that there are only 2 m ~ fc valid choices for w, to store w using only k bits, 
resulting in compression rate R. For example, we could store the fc-bit information 
vector that when encoded with the bottom LDPC code yields w. 




Figure 1. Illustration of the compound code construction, involving an LDGM (top section) 
with 7t = 4 and an LDPC (bottom section) with (-f v ,j c ) — (2,4). 

Random LDPC Ensemble: For the bottom LDPC code, we use the standard 
(iv, 7 c )-regular LDPC ensemble studied by Gallager |EJ. Specifically, each of the m 
variable nodes in the middle layer connects to 7^ check nodes in the bottom layer. 
Similarly, each of the k check nodes in the bottom layer connects to 7 C variable nodes 
in the middle layer. For convenience, we restrict ourselves to even check degrees j c . 
Note that these degrees are linked to the rate via the relation ^ = 1 — Rh. A random 
LDPC code Cb = Cb(7u,7 c ) is generated by choosing uniformly from this ensemble. 

Random LDGM ensemble: For the top LDGM code, each of the n checks at the 
top are randomly connected to 7 t variable nodes in the middle layer chosen uniformly 
at random. This leads to a Poisson degree distribution on the information bits and 
makes the resulting distribution of a random codeword easy to characterize: 

Lemma 1. Let G be a random generator matrix obtained by placing 7 t ones in each 
column uniformly at random. Then for any vector w G {0, l} m with a fraction of v 
ones, the distribution of the corresponding codeword w G is Bernoulli(5(v] , ~f t )) where 

5{v- lt ) = l --[l-{l-2vV}. (1) 

3 Main Results 

Although our methods apply to the compound construction more generally, we 
state our main result in application to the special case with R t = 1 and Rb = R. 
For these choices, we can guarantee that our compound construction approaches the 
optimal rate-distortion trade-off as the blocklength n tends to infinity using finite 



choices of degrees in our LDGM/LDPC construction. 1 

Theorem 1. Consider an arbitrary rate distortion pair (D,K(D)) . For any A > 0. 

there exists a finite LDGM degree 7t(A, D) and an LDPC code with finite degrees 

7«(A, D) and 7 C (A, D) such that a randomly chosen code with rate R = R(-D) + A in 
the associated LDGM ensemble achieves distortion D with probability 1 — exp(— cn) 
for some constant c. 

As a particular example of our results illustrated later in Fig. El the degree choices 
7t = 4, 7 C = 8 for rate R(-D) = 1/2, are sufficient to make the gap A zero (within the 
precision of our numerical calculations). The proof of Theorem ^ consists of several 
steps, which we motivate and describe in the following text. Proofs of these auxiliary 
results are provided in the appendix. 

3.1 Expected Number of Good Codewords 

For a length n code C and a source vector s, we define z (C, s, D) to be the number 
of codewords that are within Hamming distance Dn of s. Specifically, let £;(C, s, D) 
be 1 if the zth codeword in the compound code C is within Hamming distance Dn 
of the source s, and otherwise. Then 

«(C,s,D) 4^(0,8,13). (2) 

i 

Ideally, z (C, s, D) should be large and there should be many good codewords pro- 
vided that the rate exceeds the rate-distortion function: R > l — H b (D). Specifically, 
if we consider a random source vector s and a randomly generated code C, then the 
probability that the code is successful is simply 2 Pr[z(D) > 0]. 

Since analyzing this probability directly is generally difficult, most random cod- 
ing arguments consider the expectation E[z(D)}. For essentially any code (and 
in particular the compound construction), it is possible to show that the expected 
number of good codewords is large: 

Lemma 2. 

E[zm > _^2 n ( R -i 1 - H{D) ~\) . (3) 
n + 1 v ' 

3.2 Typical Number of Good Codewords 

Unfortunately, the fact that the expected number of codewords is large is insuffi- 
cient to show that the code achieves the rate-distortion bound. Rather, in order to 
show that the code is good, we must show that the typical number of good codewords 
is not too far from the expected number of good codewords (or at least non-zero). 

Our methods also yield upper bounds on the achievable distortion of the check-regular LDGM con- 
struction (Rt = R and Rb = 1). Subsequent work will describe the use of alternative rate pairs (Rt,Rt) for 
source and channel coding with side information. 

2 When the source and/or code are random, then we drop the indexing of random quantities and write 
Xi(C, s, D) and z (C, s, D) as random variables Xi(D) and z(D). 



For high density codes, this can be done by using Chebyshev's inequality, which de- 
pends on the variance of z(D). For most low density constructions (including our 
own), the variance is too large for Chebyshev's inequality to yield a useful bound. 
Consequently, we instead use Shepp's second moment method as summarized in the 
following proposition: 3 

Proposition 1. For any positive integer valued random variable z, Pr[z > 0] > 

To show that there is typically at least one good codeword, we must upper bound 
E[z(D) 2 ], which can be cast in a more useful form using the following lemma: 

Lemma 3. 

E[z{Df] = E[z(D)] + E[z(D)\ • i^Prfep) = 1 | x (D) = 1]\ (4) 

Lemma H3 illustrates one of the main differences between low density and high 
density constructions. Specifically, in a high density construction, each codeword 
can be chosen independently yielding Pr[xj(D) = 1 | x Q (D) = 1] = Pr[Xj(D) = 1] 
and implying E[z(D) 2 ] < E[z(D)\ + E[z(D)} 2 . In contrast, for low density codes, 
there will usually be some dependence between the codewords. For example, in 
the usual LDGM construction, when the information bits w have low weight, then 
the resulting codeword w G will also have low weight. Consequently, if the all- 
zero codeword is within Hamming distance Dn of the source, then these low weight 
codewords probably are as well and so Pr[xj(D) = 1 | Xq(D) = 1] can be much 
larger than Pr[xj(D) = 1]. In particular, we can bound Pr[xj(D) = 1 | xq(D) = 1] 
by considering the weight of the information sequence wj used to generate the jth 
codeword: 

Lemma 4. Let Wj G be the jth codeword obtained by multiplying a weight Vj vector 
Wj by a random matrix from the LDGM ensemble and let Xj(D) denote the event 
that codeword j is within Hamming distance Dn of a random Bernoulli(l/2) source. 
Then for any even degree j t , letting v = yields 

r , s , , x f 1 if < Vj < jy*(D; 7 t ) 

Pr[x,(D) = 1 x (D) = 1 < < Kjrnnx( » ~ 3 ~ 5 

1 V 7 J ~ 1 2-™ KL ( D ll' 5 ^^)) otherwise, KJ 



where 



1 - (l - 2D)' 



(6) 



Lemma |U shows that Pt[xj(D) = 1 | x (D) = 1] is small whenever the weight of 
the information sequence for a codeword is large. So to characterize the sum over 
this probability we must consider how many vectors of a given weight in the middle 
layer satisfy the constraints of the bottom LDPC code Cb- Specifically, we denote 

Proposition 0can be established by defining an indicator random variable, r(D), for the event {z(D) > 
0} and applying the Cauchy-Schwartz inequality to obtain E[z(D)] 2 = E[z(D)r(D)] 2 < E[z(D) 2 ] • E[r(D)], 
which is equivalent to the desired result. 



the average (log domain) weight enumerator of Cb (i.e., the rate of codewords of Cb 
with a given weight) as 

A Ch (uj) = ^-card{q 6 C b | ||q||i = cu ■ n}. (7) 
Intuitively, by combining with Lemma El we can bound the term in braces of (jlj): 

^Pr[x i (D) = l|x b p) = l]< 2 Ac * {t/n) + Yl 2 n l Ac ^- KL ^ D ^^]. 

j^O t=l t=v*{D; lt ) 

(8) 

Formally, we can use this idea to obtain the following result: 

Theorem 2. Consider a sequence of rate R compound codes of increasing blocklength 
n. Suppose that the following inequality holds for all sufficiently large blocklengths: 

1 f v*(D;yt) n ) 

R - [1 - fl* (£>)]> i log ^ 2^^+ 2^<*>- m W(i™»] \ (9) 

Then the probability that a code in the sequence fails to quantize a source with dis- 
tortion at most Dn goes to zero as n — > oo. 

3.3 Reducing Dependency Between Codewords 

The bracketed term on the RHS of corresponds to the excess rate required 
beyond the minimum 1 — iff, (D) and is plotted in Fig. 01 for the compound code in 
Fig. The first term represents the number of low weight codewords of the bottom 
code. Since the bound from Lemma 0] does not become active until weight v*(D\ 7 t ), 
making the first term negligible requires choosing the LDPC ensemble so that the 
minimum distance is greater than the weight v*(D\ r ) t ) resulting from the choice of 
the degree 7 t in the LDGM ensemble. The exponent of the second term in (jHJ) is the 
sum of the weight enumerator and the bound from Lemma EJ For this term to be 
negligible, the bottom LDPC code must have a weight enumerator that grows less 
quickly than the error exponent in ((SJ). 

Using the exact formula for the asymptotic weight enumerator of regular LDPC 
codes developed by Litsyn and Shevelev 0, it is possible to prove the following 
result: 4 

Proposition 2. There exist choices for 7 t; 7„ ; and 7 C such that the term in braces 
in (QJ) becomes negligible. 



4 The proof essentially requires showing that the sum of the weight enumerator and the bound from 
Lemma [1] is negative for all weights in [v*(D; 7 t ), 1/2]. This can be done by checking the appropriate 
derivatives of the sum and is omitted for brevity. 
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Figure 2. Log of bounds and weight enumerator for R = 1/2, j t — 4, j c — 8, at distortion 
D m 0.11 normalized by the blocklength n. The relative entropy bound from ijjjj is zero 
for weights below v*{D] r ) t ) and then quickly goes to 2 - ™/ 2 . The (log domain) weight 
enumerator for a regular rate 1/2 LDPC code is negative for weights below the minimum 
distance and then rises to 2™/ 2 . As long as the relative entropy bound is stronger than the 
weight enumerator, the excess rate in © of Theorem |2] will be negligible. 

4 Concluding Remarks 

In this paper, we proposed a new construction for low density source codes and 
introduced tools to analyze low density generator matrix codes. As stated in LemmaE] 
and illustrated in Fig. our main insight was that the source coding performance 
of a low density code can be bounded by considering the weight of the codewords. 
Thus, by using a compound code to control the weight spectrum we obtained codes 
that approach the rate-distortion function. A future paper will describe and analyze 
these types of compound constructions in application to source and channel coding 
with side information. 

A Proofs 

Proof of Lemma^ By construction of the LDGM ensemble for G, each bit of the 
codeword w G is independent of the others and is the modulo-2 sum of 7 t randomly 
and independently selected bits of w. So the resulting codeword has a Bernoulli 
distribution and all that remains is to determine the probability that a given bit is 
one, which we denote as S(v; jt). 

For any output bit, let the random variable denote whether the ith one in a 
column of G occurs in a position where w has a one (i.e., is the value of the 
variable node connected to the ith link of a given check node at the top of Fig. 
Then 5(v;jt) is exactly the probability that YllLi ^ is even. Letting A v (z) denote 
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the generating function (i.e., the z-transform) of 5^7= 1 e * yields 

7t 

8(v; 7t )-(l - <J(u; 7 t)) = A v (z = -1) = J] (Pr[e t = 0] + z" 1 Prfe, = 1] 
= fl - V + v ■ z' 1 )^ 



i=l 



z=-l 



(10) 



Z = -l 



Equating the leftmost term of (fTUJl and the rightmost term of (|11|) and solving for 
5(u;7t) yields the desired result. □ 



Proof of Lemma [U 



£[z(L>)] = E 



E £?[Xf(I?)] = ^Pr[^(q i)S ) < Dn] (12) 



2 -nKL(D||l/2) 2- n[1 - Hb ( D)] 2 n{R - [1 ~ Hb ( D)]} 

~ ^ (n + 1) 2 ^ 



(n+ l) 2 



(n + 1) ; 



(13) 



The first line follows by repeatedly expanding the definition of the random variables 
z(D) and Xi(D). For the next line, we lower bound the probability that a given 
codeword % is within distortion Dn using standard large deviations results (The- 
orem 12.1.4, [41). Note that nothing in this argument depends on the actual code 
construction itself (except for the number of codewords). □ 



Proof of Lemma [3J- 



E[z(D) 2 ] = E 



£5>(£>)J9(£>) 



L l J 



i j^i 



E[z(D)} + Pr [ rf H(q„ s) < Dn, d H {^, s) < Dn] Pr[s = s] (15) 

s i j^i 
s i j^i 

d H (0,s ® qi) < Dn] Pr[s = s] (16) 

s[z(d)] + ^2 Pr [^(^'' °) ^ ^ mo, s ') ^ Dn \ Pr ( s = s ] ( 17 ) 

s i ivo 

W] + EE Pr M^) = !.*bP) =«] ( 18 ) 



+ |X>[xb(I>) = !]} ■ {E^M^O = l\x (D) = 1] j 



(19) 



To obtain (|14|) we consider the diagonal terms separately from the off-diagonal terms, 
and note that E[xi(D) 2 ] = E[xi(D)] since the Xi(D) are indicator variables. Next, 



we apply the definition of x^D) to get (|T5j) and then add to each side of the 
d H (-,-) terms to obtain (fTSj). Since the code is linear, adding the codewords and 
t\j yields another codeword which we denote qjr. This observation combined with 
writing s' = s © q.; yields (fTTj). To go from (JT7J) to (fTHj) . we note that for a uniformly 
random source, Pr[s = s] = Pr[s = s']. Finally, to obtain the desired result from (|19|). 
we note that Xi(D) is independent of % and hence E[xi(D)] = E[x (D)]. □ 

Proof of Lemma^ We focus on the case when 8(vj',jt) > D. Solving this relation 
for Vj yields the formula for v*(D; j t ) in © and so 8(vj] j t ) < D corresponds to the 
trivial bound in the top case of (j5J). Thus, for 8{vj] 7 t ) > D we have 

Pr [ Xj (D) = 1 | Xo(D) = 1 ] < Pr [d H (%, s) < Dn | d H {s, 0) < L>n] (20) 



(a) 

< max Pr 

t<Dn 



d H (qj, 1*0"-*) < 



(6) 

< Pr 



d H (q v n )<Dn 



^ 2- nKL ( D ll <5 ( 1 'j^)) 



(21) 



We obtain (|2()jl from the definition of the random variable xj(D). For (a), since q^ 
is a Bernoulli sequence, without loss of generality we can imagine that all the ones 
in s occur at the start of the sequence. To obtain an upper bound, we put in as 
many such ones as required to maximize the desired probability. In (b), we note that 
S(vj]jt) < 1/2 implies that it is more likely that a given position of q^ is zero than 
one so t = gives the largest value for the maximization. Finally, to obtain (c), 
we apply Sanov's Theorem (Theorem 12.1.4, t 4|). Note that the reason we required 
8{ v j]lt) > D originally is that this condition is required by Sanov's Theorem in 
(c). ~ □ 

Before proving Theorem 121 we require the following lemma: 
Lemma 5. For a compound code that satisfies (GJ) ; Pr[z(L>) > 0] > (1/2) • (n + l) -2 . 

Proof. First, assume that ^2j^ Pr[xj(D) \ x (D)] > 1 because if this is not the case 
then (0J immediately implies that Pr[z(D) > 0] > 1/2 and the proof is complete. 
Therefore continuing from the assumption that Ylj^o ^ >T l x j(^ > ) I x o(D)] > 1 yields 

to E[z(D)] 2 (6) E[z(D)] 2 
FI[Z{D) >0l ~ E[zpH - E[z(D)]{l + E i#0 Pr[x i (D)|x b (D)]} (22) 

W E[z(D)} 2 E[z(D)]/2 



2 ■ E[z(D)) ■ {E^o^mxoiD))} Z m ^[xAD)\xo(D)} 

(23) 

to E[z(D)]/2 to 2 n l R -[ 1 "^( D )]> 1 

- 2 n{R-[i-Hi,(r>)]} - 2(n + l) 2 • 2«{ R -[ 1 ~^( D )]} ~ 2(n + l) 2 ^ ' 



where (a) follows from Proposition ^ (b) comes from Lemma El (c) follows from the 
assumption in the first sentence, (d) comes from J0J), and (e) comes from Lemma El 

□ 



Proof of Theorem^- Lemma tells us that the probability that at least codeword is 
found within distortion D is at at least (1/2) /(n + l) 2 , i.e., there is at least a small 
chance that a good codeword exists. The key insight of the remainder of the proof is 
that while (1/2)/ (n + l) 2 may be small, it is not exponentially small. Hence if we can 
show that the distortion for a compound code is concentrated near its typical value 
except with some exponentially small probability, then Lemma immediately implies 
that the event {z(D) > 0} must correspond to the typical distortion. To prove 
exponential concentration, we show that the actual error probability, Pt[z(D) = 0] is 
smaller than e~ cn for some constant c using martingale arguments QjJ QU] • 

Specifically, we define a Doob martingale mj(Cb) that is the expected value of the 
distortion between the best codeword and the source (conditioned on the bottom 
code Cb) when the first i columns of the generator matrix G (i.e., the connections 
from the first i checks to their respective variables) of the top code in Fig.^have been 
revealed. Going from step % to i + 1 and revealing check i+1 can only change the value 
of the martingale mj(Cb) by at most 1. Hence, by the Azuma-Hoeffding inequality, 
the probability that a sample path of the martingale differs from its expected value 
by more than e is less than 2e~ ne . 

Since Lemma El shows that the probability that Pv[z(D) > 0] is at least an inverse 
polynomial (and hence not exponentially small), the event {z(D) > 0} must deter- 
mine the expected value of the martingale. Therefore other events that result in a 
distortion larger than D (e.g., {z(D) = 0}) must be exponentially small. □ 

Proof of Theorem^ Combining TheoremElwith Proposition|2]establishes this result. 

□ 
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